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Abstract 

This paper studies the joint support recovery of similar sparse vectors on the basis of a 
limited number of noisy linear measurements, ie., in a multiple measurement vector (MMV) 
model. The additive noise signals on each measurement vector are assumed to be Gaussian 
and to exhibit different variances. The simultaneous orthogonal matching pursuit (SOMP) 
algorithm is generalized to weight the impact of each measurement vector on the choice of 
the atoms to be picked according to their noise levels. The new algorithm is referred to as 
SOMP-NS where NS stands for noise stabilization. 

To begin with, a theoretical framework to analyze the performance of the proposed 
algorithm is developed. This framework is then used to build conservative lower bounds on 
the probability of partial or full joint support recovery. Numerical simulations show that 
the proposed algorithm outperforms SOMP and that the theoretical lower bound provides 
a great insight into how SOMP-NS behaves when the weighting strategy is modified. 


1 Introduction 

The recovery of sparse signals of high dimensions on the basis of noisy linear measurements 
is an important problem in the field of signal acquisition and processing. When the number 
of linear observations is significantly lower than the dimension of the signal to be recovered, 
the signal recovery may exploit the property of sparsity to deliver correct results. The field of 
research that studies such problems is often referred to as compressed sensing or compressive 
sensing (CS) [TT| . 

Several computationally tractable methods to address CS problems have been developed 
in the last two decades m m El 1211 Ea EiHai]. Among them, greedy methods prove to be 
valuable choices as their complexity is significantly lower than that of algorithms based on £i- 
minimization j34| . 


While many CS problems involve only one sparse signal and the corresponding measurement 
vector, i.e., the vector gathering all the linear observations of this signal, some applications either 
require or at least benefit from the presence of several sparse signals and measurement vectors. 
Examples of such applications are available in Section 2.3 Models involving one measurement 
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vector are referred to as single measurement vector (SMV) models while multiple measurement 
vector (MMV) models involve at least two measurement vectors |17) . 

When the snpports of the sparse signals are similar, it is possible to improve the reliability 
of the recovery by making joint decisions to determine the estimated support [8l [22]. Thereby, 
all the measurement vectors intervene in the estimation of the support and the final support 
is common to all the sparse vectors. Algorithms performing joint recovery are also capable to 
weaken the influence of additive measurement noise on the performance provided that the noise 
signals are statistically independent and exhibit some degree of isotropy. 

Orthogonal matching pursuit (OMP) is one of the most extensively used greedy algorithm 
designed to solve SMV problems |25j . Among several greedy algorithms conceived to deal with 
multiple measurement vectors, the extension of OMP to the MMV paradigm, referred to as 
simultaneous orthogonal matching pursuit (SOMP), is of great interest as it remains simple, 
both conceptually and algorithmically [32]. 

1.1 Motivation Objective 

The classical SOMP algorithm does not account for the possibly different measurement vector 
noise levels. In some sense, all the measurement vectors are considered equally worthy. How¬ 
ever, it is clear that an optimal joint support recovery method should necessarily take into 
account the noise levels by accordingly weighting the impact of each measurement vector on 
the decisions that are taken. The first aim of this paper is to extend SOMP by gifting it with 
weighting capabilities. The new algorithm will be referred to as SOMP with noise stabilization 
(SOMP-NS) and basically extends the decision metric of SOMP to weight the impact of each 
measurement vector onto the decisions that are taken. 

The second objective is to provide theoretical and numerical evidence that the proposed 
algorithm indeed enables one to achieve higher performance than the other greedy alterna¬ 
tives when the noise levels, or more generally the signal-to-noise ratios (SNR), vary from one 
measurement vector to another. 


1.2 Detailed contribution 


We study partial and full support recovery guarantees of SOMP-NS for a MMV signal model 
incorporating arbitrary sparse signals to be recovered and statistically independent additive 
Gaussian noise vectors exhibiting diagonal covariance matrices, i.e., the entries within each 
vector are statistically independent. It is assumed that the variances of the entries within each 
noise vector are identical although they may be different for each measurement vector. The 


signal model is thoroughly detailed in Section 2.2 


Our first contribution is the proposal of SOMP-NS which generalizes SOMP by weighting 
the measurement vectors. The second contribution is a novel theoretical analysis of SOMP and 
SOMP-NS in the presence of additive Gaussian noise on the measurements. To the best of the 
authors’ knowledge, the theoretical analysis in this paper has never been proposed, neither for 
SOMP nor for SOMP-NS. 


Finally, numerical simulations will show that the weighting capabilities of SOMP-NS enable 
one to improve the performance with regards to SOMP when the noise vectors exhibit different 
powers. The numerical results will also provide evidence that the theoretical analysis accurately 
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depicts key characteristics of SOMP-NS. In particular, closed-form formulas for the optimal 
weights will be derived from the theoretical analysis and will be compared to the simulation 
results. 


1.3 Related work 

Several authors have worked on similar problems. The study of full support recovery guarantees 
for OMP with (.2 or .^oo-bounded noises as well as with Gaussian noises has been performed in 
[ 3 ]. The authors of [3] also provided conditions on the stopping criterion to ensure that OMP 
stops after having picked all the correct atoms. 

Our analysis is similar to that performed by Tropp in [33] for convex programming methods 
in a SMV setting. Together with Gilbert jM], they analyzed the probability of full support 
recovery by means of OMP for Gaussian measurement matrices in the noiseless case. Their 
result has subsequently been refined by Fletcher and Rangan in |20l [28] to account for additive 
measurement noise by means of a high-SNR analysis, i.e., it is assumed that the signal-to-noise 
ratio scales to infinity. All of the papers discussed so far only focus on the SMV framework. 

The theoretical analysis of our paper is partially inspired from [3| and has been generalized 
to the MMV framework. It is worth pointing out that our analysis does not require the high 
SNR assumption of |2ni[28|, properly captures the benefits provided by multiple measurement 
vectors but nevertheless presents some inaccuracies that are to be discussed at the end of this 
paper. 

Gribonval et al. have performed an analysis of SOMP for a problem similar to ours in |22| . 
They were interested in providing a lower bound on the probability of correct support recovery 
when the signal to be estimated is sparse and its non-zero entries are statistically independent 
mean-zero Gaussian random variables exhibiting possibly different variances. 


While our statistical analysis considers the additive measurement noise as a random variable 
and the sparse signals to be recovered as deterministic quantities, the results obtained in m 
use the opposite approach, he., the sparse signals are random and the noise is deterministic. 
Thus, the problem addressed in our paper differs from that presented in [25| but both papers use 
similar mathematical tools and the criteria to ensure full support recovery with high probability 


share analogous properties. This last remark will be further discussed in Section 6.5 


1.4 Outline 

First of all. Section [^progressively introduces the context, provides a detailed description of the 
signal model and depicts an associated application. Afterwards, Section [^ provides descriptions 
of SOMP and SOMP-NS. 


Before deriving the theoretical analysis. Section [^ introduces the mathematical tools neces¬ 
sary for its execution. Section [^ then provides general theoretical results on the proper recovery 
of sparse vectors by means of SOMP-NS. On the basis of the results from Section]^ we show 
in Section [^ that, for Gaussian noises, the probability of failure of SOMP-NS decreases expo¬ 
nentially with regards to the number of measurement vectors. 

In Section [^ extensive numerical simulations show that adequate weighting strategies en¬ 
able SOMP-NS to outperform SOMP whenever the noise variances for each measurement vector 
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are different. Also, a closed-form weighting strategy is derived from the theoretical analysis of 
the previous sections and these weights are compared to the optimal ones obtained by sim¬ 
ulation. Finally, the simulation results show which aspects of the behavior of SOMP-NS are 
properly captured by the proposed theoretical analysis. The reasons why our analysis fails to 
capture some characteristics of SOMP are discussed and potential workarounds are proposed 
for investigation. 

1.5 Conventions 

We find preferable to introduce here the common notations used in this paper. First of all, 
Vn G N, [n] := {1, 2,..., n}. For any set A, | A| refers to its cardinality. Matrices are denoted 
by upper case bold letters while vectors are written in lower case bold letters. The tranpose 
operation is denoted by the superscript For x G M"', Xi denotes the Tth component of x. 
Similarly, the j-th column vector of any matrix is denoted by the corresponding lower case 
bold letter with subscript j, e.g., the j-th column of $ is cjjj. The (i, j)-th entry of a matrix 
is denoted by the upper case letter with subscript i,j, e.g., the (i, j)-th entry of $ is written 
The £p norm (1 < p < oo) of 0 G is defined as ||0||p := Moreover, 

the £oo norm of vector 0 G M*” is defined as ||0||oo := 00 For any matrix $ G 

span($) denotes the space spanned by its column vectors. Also, the trace of $ is tr($) and 
the Frobenius norm of $ is denoted by ||^||f- For S C [n], $5 denotes the submatrix of 
$ that comprises its columns indexed by S. Likewise, xs is the subvector of x comprising 
only the components indexed within S. The Moore-Penrose pseudoinverse of any matrix $ 
is denoted by The orthogonal complement of a vector subspace A is given by A-^. For 
any random variable X, its cumulative density function (CDF) is denoted by Fx while its 
probability density function (PDF) is written fx (when it exists). Similarly, the joint CDF and 
PDF of the random variables Xi,... ,Xk are written as Fx^^...,Xk fxi,...,XK respectively. 
The probability measure is given by P while the mathematical expectation is denoted by E. 
The statistical independence symbol is written _LL. 

2 Signal model 

2.1 Context 

We define the support of a vector x as supp(a;) := {i : x* / 0}. The £0 “norm”[^of a vector 
X is defined as ||a;||o = |supp(a;)|. Loosely speaking, we say that x G M"" is sparse whenever 
Iloilo 'C n. Moreover, x is said to be s-sparse whenever ||a;||o < s. 

Let us consider a collection of K signals G < k < K) that are sparse in an or¬ 
thonormal basis, i.e., = AfXk where G represents the orthonormal basis and x^ G M"" 

is the sparse coefficient vector of expressed in 

We now consider a unique linear measurement process to recover each one of the K sparse 
coefficient vector x^. Moreover, the measurement process is assumed to deliver a number of 
observations m significantly lower than n. Additive measurement noise vectors G are 
also accounted for. Formally, the latter statements rewrite 

y,, = ^^Xk + Bk ( 1 ) 

^This is an abuse of language as this mathematical object does not satisfy the properties of a norm 
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where the measurement matrix $ G j^mxn (denotes the linear measurement process and the 
measurement veetors {yk}i<k<K gather the observations. 

Since the number of observations m is lower than n, arbitrary vectors cannot be recov¬ 
ered from yj^, even in the noiseless case. However, in the noiseless case, it has been shown 
[IMIEH] that Xk can be recovered provided that it is sparse enough, i.e., the cardinality of 
its support is below a certain threshold, and that the measurement matrix $ exhibits specific 
properties such as the restricted isometry property that is described afterwards. 

The orthonormal basis T' is often assumed to be the canonical basis. The reason that 
explains this simplification relies on the fact the the measurement matrix $ is usually generated 
as a realization of a subgaussian random matrix. It can be shown that such random matrices are 
well-condtionned for sparse support recovery, even when multiplied by orthonormal matrices, 
i.e., remains a subgaussian random matrix. This phenomenon is more thoroughly 

discussed in |21t Section 9.1]. Thereby, the signal model will be simplified to 

yk = + efc. (2) 

If several sparse signals Xf. share similar supports, then it is interesting to simultaneously 
recover their joint support Ui<fc<i 4 ' supp(a;fc) instead of performing K independent and possibly 
different estimations of the support of every vector x^. The reason that explains why such a 
strategy yields performance improvements is twofold: 

1. It has been shown [22] that when the sparse vectors x^ share a similar support whose 
associated entries are highly variable from one vector to another, then the probability of 
correct support recovery increases with the number of available measurement vectors K. 

2. In the noisy setting, the vectors are often statistically independent and isotropic which 
suggests that a joint support recovery procedure could be capable of filtering them. This 
property is part of what the theoretical results and simulations of this paper establish. 

Once the joint support has been recovered, Xk is easily recovered by solving a least squares 
problem involving only the non-zero entries oi x^ and the associated column vectors from 

2.2 Signal model 

We wish to provide here a formal and precise statement of the signal model to be used in the 
rest of this paper. 

As mentioned previously, we consider K measurement vectors y^. that are generated on the 
basis of K sparse signals x^ whose supports are similar. We consider the following MMV model: 

Y = + E (3) 

where Y = • • • , y^) is composed of the K measurement vectors G M™. X = [xi, • • • , xk) 

comprises the K sparse signals x^ G M"'. Finally, the columns of E = (ei, • • • , ex) correspond 
to measurement errors. Each error vector is distributed as and all of them 

are statistically independent. The purpose of Q is to aggregate the K equations defined in 

into a single relation. This representation will be preferred throughout the rest of this paper. 

Before going further, we will point out that the mathematical problem of the joint sup¬ 
port recovery is equivalent to finding the columns of $ that enable one to fully express 


5 


Thereby, $ may be seen as a dictionary matrix whose columns {(f>j}j&[n] 3-re the atoms of the 
associated dictionary. The problem of joint support recovery then boils down to determining 
which atoms to choose to simultaneously express the K measurement vectors as their linear 
combinations. Although viewing $ as a measurement process is well suited to the description 
of typical applications, the dictionary approach is more appropriate for the sake of presenting 
the mathematical results and will thus be adopted for the rest of this paper. However, the term 
measurement vector will not be replaced to stay consistent with the standard MMV terminology. 

The dictionary matrix $ is assumed to satisfy the restricted isometry property of an order 
equal to or higher than the cardinality of the joint support S = U^^supp(a;fc). Moreover, it 
will be assumed that each column of this matrix exhibits a unit £2 norm. We briefly review 
two procedures to obtain a dictionary matrix that satisfies both properties above with high 
probability: 

1. Generate a random Gaussian matrix and then normalize the £2 norm of its columns. In 
such a way, the atoms are uniformly distributed on the unit hypersphere of dimension 
n. It is possible to show that this class of random matrices satisfies the restricted isometry 
property with high probability (see [1] for example). 

2. Generate a matrix whose entries are statistically independent Rademacher random vari¬ 
ables. Each column of the resulting matrix is then normalized by multiplying the matrix 
by Ijy/m to obtain atoms exhibiting unit £2 norms. 

2.3 Applications 

The typical scenario associated with signal models and (§ is depicted in Figure The 
idea is to observe a physical quantity, e.g., a chemical composition, a wireless signal, etc. at 
different locations and/or time instants by means of K sensing nodes whose only purpose 
is to acquire observations, i.e., measurement vectors, and repatriate them to a central node 
(GN) that will simultaneously process all the data. In such a configuration, the sensing nodes 
are generally cheap and exhibit very limited computational capabilities and power consumption 
while the central node is more costly because it has to deliver higher performance. 


^2 yk = ^Xk + Bk 



Figure 1: First scenario -- K nodes with different noise levels generate K measurement vectors 
on which joint estimation of the sparse support U^^supp(a;fc) can be performed 


Although several applications of the signal model ([^ are presented in [21 Section 3.3], we 
propose to take as an example the spectrum sensing problem. Spectrum sensing aims at scan¬ 
ning multi-gigahertz electromagnetic (EM) spectrums at a rate that is below that of Nyquist. 


6 


The reason that motivates this objective is that, although most of the available frequency bands 
are licensed to specific users and thus costly to acquire, it has been observed that the spectrum 
occupancy is limited, i.e., the spectrum is sparse in the frequency domain at a given time in¬ 
stant and location. Therefore, it would be interesting to observe this spectrum through an 
appropriate linear measurement process, as described in Equation Q, and then use algorithms 
tailored for CS problems to determine, in real-time, which frequency bands are free and can be 
used to transmit information. 

In this application, each entry of the sparse signals Xk represents the power of a given fre¬ 
quency band. Since most of the spectrum is assumed to be unused at a given time instant 
and location, the vectors Xk are expected to be sparse. Although the nodes should ideally be 
exposed to the same spectrum, this is not the case in practice because of the Rayleigh fading 
that strongly attenuates some frequency bands, thus being invisible to some nodes. In multiple 
input multiple output (MIMO) wireless communications, this issue is circumvented by placing 
the receivers sufficiently far away from one another so that the fading becomes statistically 
independent for each node and thus highly unlikely to strongly attenuate the same frequency 
band for every node. In a likewise fashion, the same solution will work for our framework since 
occasional “holes” will reveal to be a nonissue when performing joint decisions. 

Finally, the different nodes may exhibit different noise levels because of discrepancies in the 
fabrication process or because the hardware components (e. 5 ., amplifiers, multipliers, filters) of 
each node are different. This last remark justifies the multiple noise variances hypothesis of this 
paper. 


3 Towards a weighted greedy algorithm 


The two next subsections present two methods for addressing the problem of joint support 
recovery envisioned in Section 2.2 The first method, SOMP, is standard in the literature but 


does not include noise stabilization. The second method, our contribution, generalizes the first 
one by multiplying each measurement vector {1 <k < K) by a weight > 0. 


3.1 Simultaneous orthogonal matching pursuit 

The original OMP algorithm [251 130] has been generalized in several ways to deal with matrix 
signals Y E i.e., MMV problems. Simultaneous orthogonal matching pursuit (SOMP) 

is a possible generalization of OMP [HI EU [32] . 

SOMP is a greedy algorithm that provides approximate solutions to the joint support re¬ 
covery problem by successively picking atoms from $ to simultaneously approximate the K 
measurement vectors G SOMP is described in Algorithm]^ 

We now explain how SOMP proceeds. The residual at iteration t, denoted by consists 
of the projection of each one of the original signals yj^ onto the orthogonal complement of 
span($ 5 j. In such a way, the residual is orthogonal to every atom that has been chosen so 
far. Initially, the residual is chosen equal to the original signal. The decision on which atom to 
choose (step 4) is based on the sum of the inner products of every atom cf)j with each residual 


7 



Algorithm 1: 

Simulatenous orthogonal matching pursuit (SOMP) 


Require: Y G $ G s > 1 

1 

Initialization: ^ Y and S'o ^ 0 

2 

t ^ 0 

3 

while t < s do 

4 

Determine the atom of €> to be included in the support: 
jt ^ argmax^-(||(R(‘^)'^</)^-||i) 

5 

Update the support : St+i •(— S'* U {jt} 

6 

Projection of each measurement vector onto span($Sj^,^): 

7 

Projection of each measurement vector onto span($ 5 j^,^)-’- : 

^(t+l) ^ y _ y (t+1) 

8 

t i — t 1 

9 

end while 

10 

return Ss {Support at last step} 


measurement vector (where refers to the /c-th column of since 

K 

fc=l 

The index of the atom maximizing the ii norm is included in the support (step 5). Then, 
the original signal Y is projected onto the orthogonal complement of span($ 5 j^j) (steps 6 and 

7). 


In this setting, SOMP stops after exactly s iterations. However, it is worth mentioning that 
the stopping criterion usually comprises a criterion based on the number of iterations as well 
as another one relying on the norm of the residual, i.e., if the norm of the residual is below 
a certain threshold, then OMP stops. Different norms can be used for the second criterion 
but these considerations will not be further discussed in this paper. The interested reader can 
consult [3] for related matters. 

Furthermore, maximizing the ii norm in step 4 is not the unique choice. Other authors 
have investigated different norms, e.g., the £2 and £00 norms. Nevertheless, some numerical 
simulations reveal that the choice of the norm has very little effect on the performance (see [H 
Figure 3]). 

3.2 SOMP-NS: A weighted greedy algorithm 

We now present the development of a noise stabilization strategy to be used in conjunction with 
SOMP that has low computational requirements. The equivalent new algorithm is referred to 
as SOMP-NS where NS stands for noise stabilization. Algorithm describes the first form of 
SOMP-NS. 

This novel algorithm is a generalization of SOMP that weights the impact of each measure¬ 
ment vector within matrix Y G on the decisions performed at each iteration. 

SOMP-NS is actually very close to SOMP. Both algorithms decide on which atom to pick 
on the basis of a sum of absolute values of inner products, each term in the sum depending only 
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Algorithm 2: 

SOMP with noise stabilization (SOMP-NS) - First form 


Require: Y G ^ g {qk}i<k<K, s > 1 

I 

^(0) ^ Y and 5o 0 {Initialization} 

2 

t ^ 0 


3 

while t < s do 


4 

Determine the atom of 4> to be included in the support: 


jt ^ argmax^- (^X)f=i 


5 

St+i St Li {jt} 


6 

Projection of each measurement vector onto span ($5 ): 



Y 

7 

Projection of each measurement vector onto span(^ 5 ^^j^)“^: 

^ y _ y(t+l) 

8 

t i — t -j- 1 


9 

end while 


10 

return Sg {Support at last step} 


on one measurement vector. SOMP gives the same importance to each measurement vector 
whereas its weighted counterpart introduces weights > 0 (1 < fc < iF) so as to give more or 
less importance to each measurement vector. 

A second form of SOMP-NS that is more computationally efficient is available in Algorithm 
In the second form, SOMP(l^, s) refers to the regular SOMP algorithm described in 
Algorithm 

Algorithm 3: 

SOMP with noise stabilization (SOMP-NS) - Second form 

Require: V € $ e {qk}i<k<K, s > 1 

1: diag(qi,... 

2: The columns of Y are weighted beforehand: Y ^ YQ 
3: Apply the regular SOMP algorithm: S ^ SOMP(Y', s) 

4: return S 


Both forms are equivalent since \ 4>j)\qk = \ {rk\k, <Pj)\ = \ 4>j) \ and {R^^'^Q)k = 

{{I — ~ Q)k- The last equality holds true because the orthogo¬ 
nal projector {I — is applied to each column of separately. Although the residual 

matrix is different for the two forms of SOMP-NS, the difference only consists in a mul¬ 
tiplicative term for each column k of and it does not modify the atoms added to the 
estimated support. 

4 Technical background for the theoretical analysis 

This section briefly explains the mathematical tools needed to conduct the theoretical analysis. 

4.1 Dictionary and coherence 

Let $ G tjg a matrix composed of n column vectors cf)j G M™. Moreover, Vj G [n], 

||0j||2 = 1. The coherence of $ is defined as 
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//($) := max|(0j,<^ )| =max|^^^ | (5) 

Similarly, one can define the cumulative coherence function (also referred to as the Babel func¬ 
tion) of $ as 

fii ($, p) = rnax max V | ( 0 *, | • ( 6 ) 

A,|A|=p ^ ' 

Trivially, /u($) = /xi($,l). For the sake of brevity, /x($) and pi($,p) will be respectively 
denoted as p and /ii(p) from now on. 

It is also frequent to use the bound 

pi(p)<pp. (7) 


4.2 Support and norms 


If [/ = (ui, U 2 , ..., Un) G ]^mxn Uj G (1 < j < n), then supp(L/') := 

Uje[n] ®upp(uj). The definition above extends the notion of support to matrices, i.e., the 
support of a matrix is the union of the supports of its columns. Similarly to the vector case, if 
U = (ui, U 2 , ..., Un) G then ||I7||o := |supp(I7)| = Ujg[n] supp(uj) . 


We define ||(?!>||min := l^il which is not a norm. Furthermore, for $ G ||$||p_^q 

is defined as [211 Equation A. 8 ] ||4*||p_^q := sup||(^||^=;^ ||$<^||g where 1 <p,q< oo. For the sake 
of simplifying the notations, we will adopt the convention ||^||p := ||$||p^p. It can be shown 
that, for $ G [2T1 Lemma A.5] 


|$||oo = maxy^ |4>jj| = 

*e[m] 


1 $ 


T| 


( 8 ) 


4.3 Sparse rank 

The sparse rank m of $ G denoted by spark(4'), is given by 

spark($) = min ||u||o s.t. = 0. (9) 

spark(4') is thus the smallest number of linearly dependent columns of $. Equivalently, it 
means that, if spark($) > s, then, for every support S such that IS"! < s, the columns of $5 
are linearly independent, i.e., $5 has full column rank. 

Note that computing spark(4*) for a given matrix $ is not computationally tractable as this 
problem is even harder to solve than a Iq norm minimization problem which is known to be 
NP-hard [T^ . 

4.4 Restricted isometry property 

The matrix $ G satisfies the so-called Restricted Isometry Property (RIP) |Z] of order s 

if there exists a constant <5<j < 1 such that 

(1 - (is)||u||i < \\^u\\l < (I -h hs)||u ||2 (10) 
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for all s-sparse vectors u. The smallest 5s that satisfies Equation (10) is called the Rectricted 
Isometry Constant (RIC) of order s. The RIC of order s can theoretically be computed by 
considering 


Us = max Amax(^l^s) - 1 
S’C[n] 

|5|=s 

Ls = l- min Amin(*|^s) 
SC[n] 

|S|=. 


where Amin and Amax denote the smallest and largest eigenvalues respectively. Then, 

5s = iaiax{Us,Ls). ( 11 ) 

Evaluating Ug and Lg is not computationally tractable as it requires to determine the smallest 
and largest eigenvalues of (^) matrices of size mx s. In particular, this problem has been shown 
to be NP-Hard in the general case j35j. It is therefore interesting to find an upper bound on 5s 
that can be easily computed. 

Lemma 1 (Coherence bound on the RIP). If ^{s — 1) < 1, then 

max Amax(^|* 5 ) < 1 + Aii(s - 1) < 1 + (s - 1 )^ 

SC[m] 

|S|=s 

min Amin(^l^s) > 1 - t^iis - 1 ) > 1 - (s - 1 )^. 

SC[m] 

|S|=. 

The first inequality of each line holds if fii{s — 1) < 1. 

Proof. This result is obtained in [3]. □ 

A consequence of Lemma[^is that, if fi{s — 1) < 1, then 

< lii('S - 1) < (s - 1)/U- (12) 

It is worth noticing that if 5s < 1, then spark($) > s. The reason is that < 1 implies that 
min 5 c[m],|S|=s-^min(^S^s) > 0 which in turn implies that, for every support S of cardinality 
equal to or less than s, $5 has full column rank. 

4.5 Lipschitz functions 

A Lipschitz function / : —)• M : 1 —;■ f{g) with regards to metric £2 is a function that satisfies 

3L > 0 :\/x,y e , \f{x) - f{y)\ <L\\x- yW^ . (13) 

The constant L is called the Lipschitz constant. 

5 SOMP-NS: general theoretical results 

We now wish to derive basic theoretical results needed to analyze the performance of SOMP- 
NS in both the noiseless and noisy cases. The theoretical framework developed hereafter will 
be used in Section to give lower bounds on the probability of partial or full recovery using 
SOMP-NS when the additive noise is Gaussian. 
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5.1 Support-dependent exact recovery criterion without noise 

We now wish to build an exact recovery criterion (ERC) that guarantees, for dictionary matrix 
the recovery of every sparse signal of support S by means of SOMP-NS. The result that will 
be obtained is similar to an earlier result of J. A. Tropp, sometimes referred to as Tropp’s ERC 

iMj. 


First of all, it is worth noticing that the maximum correlation in 
be written as 


/ K 

maxjg[„] X] 

qk = 


\fc=i 

J 



step 4 of Algorithm can 
(14) 


where Q = diag {qi,..., qk) is the diagonal matrix that contains the weights to be applied to 
each column vector of It is assumed that (7fc>0 (l<A:<Ar). 


The next result allows to state the ERC for SOMP-NS. It will also be used to develop the 
exact recovery criterion for the noisy case later. 

Lemma 1.1 (A lower bound on the maximal residual projection). Let $ G 

full column rank for some support S C [n]. Moreover, = ^St^St denotes the orthogonal 

projector onto span($sj where St C S. Let R^^'> = (/ - Pd'))Y = (/ - pW)$X, Y G 

and X G where K > 1. It is assumed that supp(W) = S. Under these conditions, the 

following inequality holds true 

||$|pWQ||oo (15) 

where S denotes the relative complement of S with respect to [n]. 

Proof. Following the steps of [U Theorem 4.5], the lemma is easily obtained. Each column 
of the matrix R^'l can be expressed as a linear combination of the columns of $ 5 . The rea¬ 
son that explains this last statement is that Yk = {^X)k = ($ 5 X 5 )^ G span($ 5 ) and 
{{I - pdl)Y)k G span($ 5 ) U span($ 5 j = span($ 5 ). 

Moreover, $5 is guaranteed to have full column rank which implies that the Moore-Penrose 
pseudoinverse is equal to ($^$ 5 )“^$! and consequently that (($^)'^$ 5 )P*'*^ = = 

Rd\ Furthermore, it is easily established [ 8 l Lemma 4.4] that for two matrices A and B, 
||A.P||oo ^ II A.|| 00 II P ||oo . 

Combining the results above yields 

||^Jp^*)Q||oo = ||$|($^)^$|pWQ||oo 

< ||$|(^ + )T||^||^T^Wg||^ 

Finally, using Equation (j^ shows that ||$^($g)"'"||oo = and concludes the proof. □ 

We are now ready to provide an ERC for SOMP-NS. 

Theorem 1 (ERC for the noiseless SOMP-NS algorithm). Let $ G and X G . If 

Y = ^X and $5 has full column rank, then a sufficient condition for SOMP-NS to properly 
retrieve the support S of X after exactly |5| iterations is 

ll^5%lli < 1 (16) 

where S is the relative complement of S with respect to \n\. 
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Proof. Let us assume that SOMP-NS has made correct decisions before iteration t. The greedy 
selection ratio (at iteration t) is defined as: 




(17) 


Clearly, pt < 1 ensures that SOMP-NS performs a correct decision at iteration t since it im¬ 
plies that the largest sum of inner products is obtained for one of the atoms belonging to the 
support S, i.e., Since it is assumed that SOMP-NS has only 

made correct decisions so far, Lemma 1.1 shows that Pi < 1 holds true whenever < 1. 


Furthermore, < 1 ensures that a correct atom is picked during iteration 0. By 

induction, the hypothesis that correct decisions have been made so far will be satisfied for every 
iteration and full support recovery at iteration |5| — 1 is thus guaranteed. □ 

It can be shown that Theorem is sharp in the sense that if the inequality < 1 

is not satisfied for some support S, it is always possible to find a Xbad whose support is S for 
which SOMP-NS identifies an incorrect atom at the first iteration. The sharpness property for 
SOMP-NS directly derives from |30l Theorem 3.10] which provides an equivalent property in 
the SMV case for OMP. Indeed, if OMP fails to choose a correct atom with vector £Cbad) then 
SOMP-NS also fails with Xbad = (^bad, • • ■ j ®bad) as, in this particular case, OMP with signal 
y = $®bad and SOMP-NS with signal Y = $Xbad make the same decisions. 


Moreover, the theorem above is generally of theoretical use only since computing 
requires to know the support beforehand. Also, computing ||$g$^||i for all the possible sup¬ 
ports of a given size is not computationally tractable. 

However, it is shown in [30] that if /r < 2 | 5 jir[, then ||$g$g||i < < 1 for all the 

supports of size |5|. It is worth noticing that $5 is full rank for all supports S of size s if and 
only if spark($) > s. 

Although Theorem proves to be interesting for FI = 0 in signal model Q , it should be 
extended to the noisy case which is the purpose of the next section. 


5.2 Correct support detection criterion in the noisy case 

We develop here an ERC for the noisy case which generalizes Theorem to this context. To 
reach this result, let us assume that SOMP-NS has made correct decisions before iteration t. 
First of all, we separate the contribution of the noise E and that of the useful signal X: 

R(t) = 

= {I - + E) 

= {I- + {I- P^^^)E 

' -V-" '-V-" 

=£(*) 

= zW -hf;W. 


Then, the SOMP-NS correlation is lower bounded by 

||$|i^WQ||oo = + E^'^)Q\\oc 

> II^IZ^QIloo - 


(18) 
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Moreover, the triangle inequality yields 


||$|i?WQ||oo < II^I^WqIIoo + ||$|i;WQ||oo. (19) 

Since SOMP-NS makes a correct decision at step t if the in¬ 

equalities above show that this condition is always satisfied whenever 


II^IzWqiIoo - ||$|zW( 5 iioo > ii$|£;Wqi|oo + ii^|£;Wqi|oo. (20) 

It is worth noticing that, due to Lemma [13 the following relationship holds true: 


Note that the assumption that correct decisions have been so far is crucial for this result to be 
true. Furthermore, as $5 and are column submatrices of one easily obtains 

||$|F;Wg||oo < ii^^i^wgiioo, 

||$|F;Wg||oo < ||$^i;Wg||oo. 

A sufficient condition for ( |20[ ) to hold is thereby 

(1 - ll^IzWgiioo > 2||$T£;Wg||oo 

Theorem summarizes the previous discussion by explicitly stating the ERC in the noisy case. 

Theorem 2 (ERC for the noisy SOMP-NS algorithm). Let X be the sparse matrix to he 
retrieved and let supp(X) = S denote its support. Let $5 exhibit a full eolumn rank. Let us 
assume that only correct atoms have been picked before iteration t < IS"! and that the reduced 
dictionary matrix $5 has full column rank. SOMP-NS with dictionary matrix $ and signal 
Y = + E is guaranteed to make a correct decision at iteration t whenever 

(1 - ||$J%||i) ||$|zWg||oo > 2||$T^(t)Q||^ (23) 


( 21 ) 

( 22 ) 


where = (/ - pW)l", = {I - P^^'>)^X and = (/ - P^^^)E. 


Noticeably a necessary condition for satisfying (23) reads < 1 which is precisely 

the ERC obtained before for the noiseless case. Moreover, low values of imply a bet¬ 

ter robustness against the noise as the condition hereabove is then more easily satisfied. This 
observation is not surprising since the robustness in the noiseless case determines the amplitude 
of the noise we can apply without ruining the support recovery. 


Theorem is the cornerstone of the theoretical analysis conducted in this paper. However, 
as done hereafter, it remains desirable to find a lower bound for ||$^Z^*^g||oo that expresses in a 
simpler manner the impact of the signal to be estimated, the weights and the dictionary matrix. 
Also, the term ||$'^£l*'*^g||oo will be dealt with by means of a statistical analysis presented in 
Section [ 6 ] when the noise is Gaussian. 


5.3 Lower bound on 


We now focus on deriving a lower bound for \\^'gQW^o that can be easily evaluated. Ver¬ 
ifying Equation (23) with this computable lower bound will provide a sufficient condition for 
the ERC to hold. In particular, we desire to obtain a lower bound that does not rely on the 
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knowledge of the particular support S that is chosen. 


To reach this goal, we chose here to extend the method proposed in [3] to MMV prob¬ 
lems. The following theorem mainly has a theoretical interest and is afterwards particularized 


in Corollary 3.1 that will used in the rest of the paper. Corollary |3.2| will provide a variant of 


Corollary 3.1 relying on both the coherence of the dictionary (instead of the RIP) and on the 


quantity 


Theorem 3. Let S := supp(X) C [n] and let St denote the indexes of the atoms chosen by 
SOMP-NS at iteration t. It is assumed that St C S, i.e., only correet decisions have been made 
before iteration t. Jt = S\St contains the indexes of the correct atoms yet to be selected at 
iteration t. Let = (/ — where denotes the orthogonal projector 


onto span($ 5 'J. Then, for any G {“1; 1} < K), 


™(*) 


(24) 


where x^^'> = Ylk=i^kC^k^Qk- Moreover, if ^ satisfies the RIP with \IIt\-th restricted isometry 
constant < 1, then 




^Jt 


(25) 


Proof. Denoting the k-th. column of by we first observe that 


K 


|$|zWq||oo = max ^ 
\k=i 


it) 


^r^k 


qk 


(26) 


the maximum being taken over j G Ht since is orthogonal to span($ 5 j because of the 

orthogonal projector {I - P^^'i). Since = 1, |(0j,= \{4>j,Zk^)\ 
which implies 


K 


|$|zWq||oo = max | 


(27) 


The triangle inequality yields 


l^^Z^^^QWoo > max 
j&Jt 


K 


k=l 


Thus, 






^it) 

Jt 




^it) 


^it) 

^Jt 
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The first inequality results from the observation that, for any vector x G we have 

||®||2 < ^/\J7\\\x\\oo- The inequality is available 

in [3l Lemma 5]. The first part of the theorem is now proved. 


If $ satisfies the RIP with RIC 5\j^\ < 1, then Equation (11) yields 
which proves the second part of the theorem. 


□ 


The result above shows that the decision metric in SOMP-NS corresponding to the correct 
atoms, i.e., is closely related to the £2 norm of the signal to be recovered and 

to the singular values of It is clear that the ability of $ to conserve the norm of sparse 
vectors is necessary to ensure that the measurement noise does not absorb 


Through the following corollary, we wish to obtain a simple term that replaces ||a;j^|| 2 . 

Corollary 3.1. Let S := supp(Ar) C [n] and let St denote the indexes of the atoms chosen by 
SOMP-NS at iteration t. It is assumed that St C S, i.e., only correct decisions have been made 
before iteration t. Jt contains the indexes of the correct atoms yet to be selected at iteration t. 
Let = (/—where denotes the orthogonal projector onto span($ 5 j). 

If ^ satisfies the RIP with restricted isometry constant 5\j^\ < 1, then 


K 

1^5-^^‘^QIIoo > {1 - 6\j^\)uim'^\Xj^k\qk 

^ * k=l 

> tr(Q)(l inin \Xj^k\- 

kG[K] 


(28) 

(29) 


Proof. It is easy to show that for any x G 

\\XJ,\\2 > V^II^Ttllmin > 

It implies that 




K 


K 


'^Mjtcfqk 

k=l 

In particular, the choice of the c^^ is arbitrary so that the best lower bound is given by 


> ^/[jijvain'^Xj^kcfqk- 

l£b — 

2 k=l 


K 

E 

k=l 


Mjtcflk 


> \/\^\ 


K 

max mm V 

^j,k('k 




One easily notices that max m minjg5Xf=i^7,fc4*^Q'fc < If 3* '■= 

argmin^ then choosing = sign(Xj*^fc) is optimal, 

i.e., max^^(t)^ min^gs Yl!k=i ^jl^k^Qk = min^gg \Xj,kWk = Ef=i \Xj*,k\qk- Together with 

Theorem this result shows that the first inequality holds true. The last inequality is then 
trivially obtained. □ 


Both inequalities in the result above explicitly emphasize the impact of the weights, the RIC 
and the amplitude of the coefficients to be recovered. However, regarding the weights, it is clear 
that they will also impact the value of ||$'^P^*^(5||oo- Moreover, Equation (28) suggests that, in 
order to reliably retrieve an atom, the sum of the absolute values of the associated coefficients 
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in X should be high enough. 


As already mentioned in the introduction, Theorem appears to be a new result while 
Corollary 3.1 has already been obtained in the literature \12\ Theorem 5]. 


It is now possible to use the coherence-based inequalities provided by Lemma so as to 
derive a new bound on the basis of the previous one. We obtain Corollary |3.2| which should be 


understood as the coherence counterpart of Corollary 3.1 


Corollary 3.2. Let S := supp(X) C [n] and let St denote the indexes of the atoms chosen by 
SOMP-NS at iteration t. It is assumed that St C S, i.e., only correct decisions have been made 
before iteration t. Jt contains the indexes of the correct atoms yet to be selected at iteration t. 
Let = {I—P^^'^)^X where = ^St^St denotes the orthogonal projector onto span($ 5 j). 
If - 1) < 1, then 


K 


|$|zWQ||oo > (1 - tJ-i{\Jt\ - l))min 

k=i 


Moreover, if (ISI —t — 1)|U < 1, then both Equation (30) and (31) hold true. 


(30) 


K 


|$|zWg||oo > (1-(|S| - t-l)M)minV|X,-fc|(7fc. 


(31) 


Proof. Using Lemmaand the equality \Jt\ = IS"! — t, one obtains 

> 1 - {\Jt\ - l)/i 
= 1 - {\S\ - t- l)/i. 

The first inequality makes sense only if /ii(|j7)| — 1) < 1 while the last inequality requires 
{\S\-t-l)n<l. □ 

Although Corollary 3.2 is less powerful and general than Corollary |3.1[ it provides an inter¬ 
esting insight into how the coherence of the dictionary influences 


6 Theoretical analysis for the Gaussian noise case 

The previous section provided a non-probabilistic analysis of the quantity Q\\^ by de¬ 

riving lower bounds that are more simple to evaluate than the original quantity. Regarding 
the noise-related quantity we will in this section perform a stochastic analysis to 

derive a lower bound on the probability that it does not exceed a threshold e for Gaussian noises. 


As shown by Theorem it is possible to examine whether SOMP-NS succeeds in choosing 
a correct atom at step t by evaluating separately quantities linked to the sparse signal to be 
estimated and the noise vectors, and respectively. Since several 

simple lower bounds for have been found, it becomes possible to evaluate a lower 

bound on 


< 0.5 (1 - ||$+%||i) 


(32) 
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i.e., a lower bound on the probability that SOMP-NS makes correct decisions for signal model 
Q according to Equation (23) of Theorem]^ Corollaries |3.1| and 3.2 yield 


where 


> 


> 


< 0.5 (1 - II^IzWqIIoo 

$^£;Wq||oo < 0.5 (1 - ||$+%||i) 

II^^E^Wgiloo < 0.5 (1 - ||$|zO)g||^) 


|$|zWg||^lP) := (l-h|^^|)min^|X,-,|g, 


k=l 


K 


|$|zWg||^) := (1 _ (1^1 _ t _ i)^) mlnY,\X,,k\qk. 

k=i 


(33) 


(34) 

(35) 


A statistical analysis of ||$"'^£^^*^g||oo is proposed when ~ AA(0, crllmxm) and _LL 
for ki 7 ^ k 2 - The advantage of our approach is to take into account the isotropic nature of 
statistically independent Gaussian random vectors. Our main result is Theoremwhich shows 
that the probability of making incorrect decisions from iteration 0 to iteration s < [S'! in¬ 
cluded decreases exponentially with regards to a certain number of parameters. This theorem 
is then particularized so as to make use of the coherence /r of $ instead of the RIP and the ERC. 


0 


Eirst of all, the statistical properties of \ {4>j)^k) \ Qk for a single and arbitrary atom 

^ are investigated in Section 6.2 These properties are then extended to ||$"'"Eig||oo = 

On the basis of this last result, Theorem 


max 


ye[n] (El =1 in Section 


6.3 


1 ^ is 


obtained in Section |6.4[ A particular case of the signal model is then examined in order to 
ease, in Section the comparison of the results provided by means of the theoretical bound 
and those obtained by simulation. 


6.1 Assumptions and theoretical framework 

In this section, we restate some assumptions and define quantities used later on. 

It is assumed that the entries of each [1 < k < K) are i.i.d. mean-zero Gaussian random 
variables of variance cr|, i.e., ~ Af{0, a'^Imxm)- Eurthermore, the noise vectors are sta¬ 

tistically independent. Einally, the columns of $5 are assumed to be linearly independent. As 
already mentioned in Section [4.4[ the latter condition is true for all supports S of cardinality s 
whenever < 1 . 

We dehne 

T 

q2, •••, qx) 

a := (iTl, £ 72 , • • • , (Tk)"^ 

q('^) := (aiQ^i, a2q2, •••, crKqx)'^ ■ 


Before going on fnrther, we also dehne 


(36) 

(37) 

(38) 













and 


h{q,(T) = 



(40) 


Corollary |3.1| shows that a sufficient condition for SOMP-NS to choose a correct atom at 
iteration 0 is given by 


max 

ie[n] 


K 


Y,\{4>j,eu)\qk 


\k=l 


< 0.5 (1 


K 

^ k=i 


(41) 


Using the npper bound < ||0j|| 2 ||efc ||2 = ll^fclb is a recurrent solntion in the 

literature and provides satisfactory performance in a SMV setting (see 0). However, nsing 
such an approach in a MMV setting does not properly capture the performance gains obtained 
whenever K increases as this upper bound assnmes that all the error vectors are aligned 
with a single atom at each iteration. While this approximation is acceptable whenever only 
one measurement vector is available, it proves to be highly pessimistic as soon as one considers 
many independent (and usually isotropic) error vectors. 


This observation motivates an in-depth analysis of the statistical properties of 
maXj g (Ef=i !(<#>, ; . The analysis conducted hereafter mainly relies on the notion of 

Lipschitz functions and the related concentration ineqnalities. 


6.2 Concentration ineqnalities for one atom 

In this section, we are interested in providing a lower bound for P {^k=i \ {^j^ ^k) \ qk > for 
an arbitrary atom (pj (1 < J < n). The main resnlt of this section is Lemma |5.1[ 

Theorem 4. ]21{ Theorem 8 .40.] Let f : —>■ M 6e a Lipschitz function (with regards to 

the metric I 2 ) with Lipschitz constant L. Let g = ( 51 , 52 , • ■ • ,gn) be a vector of independent 
standard Gaussian random variables. Then, for all £ >0 

P {f{g) - E[f{g)] > e) < exp > (42) 


and consequently 

^{\f{ 9 ) -E[f{g)]\ >£)< 2exp • 

The theorem above shows that Lipschitz functions f{g) tend to concentrate around their 
expectations when g is distributed as a standard Gaussian random vector. Moreover, the con¬ 
centration gets better as the Lipschitz constant L decreases. 

This theorem is intended to be used in conjunction with the function 

K 

f : g ^ f{g) = J2(lk(^k\gk\ (43) 

k=l 

where cJfc > 0 (1 < A: < K). This function will be shown to be equivalent to Ylik=i \{^ji ^k)\ qk 
when Cfc is Gaussian. 
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We now wish to establish that / is a Lipschitz function, compute the associated Lipschitz 
constant and determine its expectation. Let x,y £ then, using the reverse triangle inequal¬ 
ity and the Cauchy-Schwarz inequality. 


\f{x)- f{y)\ 


K K 

^ ^ Q.k(^k\xk I ^ ^ (lk^k\yk \ 

k=l k=l 


K 

< '^qk(Tk \xk - yk\ 

k=l 

< ||q('^)||2||® - yh 


Therefore, a valid Lipschitz constant L of / is equal to ||q('^)|| 2 . This is the best Lipschitz 
constant since, for y = 0 and x = |/(®) “ f{y)\ = Ibll® — y\\ 2 - 


Using the concentration inequalities for Lipschitz functions requires to know the value of 
E [f{g)]. Using the linearity of the expectation and the fact that for g ~ AA(0,1), E[|g'|] = y^ 2 / 7 r, 
one easily obtains 

^ifia)] = = b{Q,(x). 

By using Theorem it is now possible to conclude that, for e > 0, 

IP ifia) - b{q, cr) > e) < exp (-K(q, a)e‘^) . (44) 

The final step of the development is to prove that the function / defined above is distributed 
as f{g) for g ~ Af{0,lmxm)- Indeed, akgk ~ AA(0, cr^) and {4>j,f^k) = YA=i(4>j)i{f^k)i ~ 

■^( 0 , W^jWWk) ~-^(o,u^). 

The following lemma summarizes the discussion above and will be used to establish the 
theorem of the next section. 

Lemma 5.1. Let 4>j G M™' where ||0j||2 = 1 (^1 < J < n). Let (1 < k < K) be independent 
random variables respectively distributed as J\f {0,a‘llmxm) where > 0 . It is also assumed 
that > 0. Then, for e > 0, 

IP gfc > 6(q,<T)< exp (-^(q, cr)e 2 ) . ( 45 ) 

It is important to highlight that Lemma [5 .1 1 provides an upper bound of the probability that 
I (0j) fife) I dk is higher than e' = b{q, cr) + e only if e' is higher than b{q, a). 

6.3 Concentration ineqnalities for n atoms 

The next problem to be tackled is that we would like to estimate an upper bound of the 
probability that niaxjg[„](^^^ \ {c))pek)\qk) is higher than a threshold e' instead of the same 
probability obtained for K^j’ ^k) \qk■ The issue we are facing is that the n random variables 

Ylik=i\^4>ji^k)\qk (for 1 < j < n) are not statistically independent which implies that the 
probability of upper bounding all these n random variables simultaneously is not equal to the 
product of the probability to upper bound them separately. However, it remains possible to 
find a workaround using the union bound, as demonstrated by the following theorem. 
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Theorem 5. Let 4>j G where ||<^j ||2 = 1 ft ^ j ^ n). Let ft <k < K) be independent 
random variables respectively distributed as where ak > 0. It is also assumed 

that Qk >0. Then, for e > 0, 


Equivalently, 


> b{q, a) +e < nexp {-K{q, cr)e^) . 

< b{q, a) + e > 1 — n exp [—K{q, cr)e^) . 


(46) 


(47) 


Proof. First of all, we observe that = 

union bound, 


max 


■je[n] 


(Ef=i \ {(t>j,ek)\ qi^. Then, by 


K 


max 

ie[n] 


X] I ('^3’ I ® s *’(9' 


U 


\k=l 
/ K 


n 


^ K<^j,efc)| gfc >b{q,a)+£ 
j=l L \A:=1 / 

K \ 

^ |(</'j,efc)| 1 > b{q,a) + 

3=1 L \fc=l / 

<n exp (^—K{q, cr)e^) . 

The first inequality results from the union bound while the second inequality holds because of 
Lemma EH □ 


6.4 Full recovery of the support 

Theorem implicitly provides a lower bound on the probability of making correct decisions 
during iteration 0. It is indeed possible to use either Equation (34) or (35) in conjunction with 


Equation (33) and Theorem]^ to obtain the desired lower bound. Nevertheless, a lower bound 
on the probability that SOMP-NS makes correct decisions from iteration 0 to iteration s < |5| 
remains to be found. This is the purpose of this section. 


Before establishing the main result, one needs Lemmas 6.1 and 6.2 


Lemma 6.1 (On the distribution of {4>j, {I — P)ek)). If ek is distributed as M{0, a^Irrixm) cind 
P is a fixed orthogonal projector matrix, then (fij, {I — P)ek) is distributed as J\f{0, cr'ft) where 
dfc < ak\\<t)j\\2 = CTk- 

Proof. We have 

{(fj,{I-P)ek) = {{I-P)fij,ek). 

An auxiliary atom cfj can be defined for index j, cfj := (/ — P)4>j- This atom satisfies the 
inequality ||(/>j ||2 < ll^jlb = 1- Let us now observe that if the entries of are i.i.d. random 
variables distributed as AA(0,fT^), then {cj}=ek) ~ A/’(0, ||(^J||cr^)). The result immediately 


follows. 


□ 


Lemma 


6.1 


basically establishes that, for a fixed projection matrix P, {(fj, {I — P)ek) is 
distributed as a mean-zero Gaussian random variable whose variance is always lower than that 
of the entries of e^.. This result will enable us to apply a statistical analysis similar to that of 
the previous section by keeping the Gaussian hypothesis. 
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Lemma 6.2 (Upper bounding the sum of random variables). We consider the random variables 
X, Yi ~ AA(0,(TyJ and Y 2 M{0,aY^) where cjyj > fTy 2 > 0. It is assumed that X AL Yi and 
X YlY 2 . Then, for all e G M, 

1P[X + |11| <e] <P[^ + |:^2| <£]• (48) 

Proof. We know that -F|y^|(y) = erf(y/(\/2(Ty^)) and F|y,|(?/) = erf(y/(\/2(jy2)) where 

erf (x) := f exp (—^^) d^. 

JO 

Since erf(x) is a monotonically increasing function, one obtains Vy > 0, -F|y^|(y) < L|y 2 |(y). 
Thus, 


'[X + \Yi\<e]= ¥[\Yi\<e-x]fx{x)dx 
J —00 

< f F[\Y 2 \ < e - x]fx{x)dx 

J — 00 


= p[x + |y2| <e]. 


□ 


In the rest of this paper, the random variable X of Lemma 6.2 will be replaced with a sum of 
K — 1 independent random variables exhibiting half-normal distributions, i.e., X = \^k\ 

where the X^ Y k < K — 1) are independent and X^ ~ AA(0, it|). Thereby, an immediate 
corollary of the lemma above is that the probability of upper bounding by e > 0 the sum of sta¬ 
tistically independent random variables exhibiting half-normal distributions is always decreased 
whenever the variance of at least one of the random variables is increased. 


Let us now state the key theoretical result of this paper, which provides a lower bound on 
the probability that SOMP-NS selects correct atoms during the first s -|- 1 iterations. 

Theorem 6 (A lower bound on the probability of partial or full support recovery by SOMP-NS). 
Let (1 < k < K) be statistically independent Gaussian random vectors respectively distributed 
as Af{0, (j'f.Imxm)- Let E = (ei, ..., ex) ■ Let X G ^/^g gig^al matrix whose support is 

S. Let also 0;^,..., be unit-norm vectors in and $ = (0]^,..., 0„) be the corresponding 

matrix which is assumed to satisfy the RIP with \S\-th RIG < 1. Let qi,...,qK 0 0- 
Then, SOMP-NS with dictionary matrix weights q^ Ifk < K) and signal Y = -|- E 

is ensured to make correct decisions during the first s -|- 1 iterations, i.e., from iteration 0 to 
iteration s included, with probability higher than 


1 - nCs exp (-K(q, cr)e(q, af') 
whenever e{q, cr) := e{q) — b{q,cr) > 0 where 




i=0 


K 


z{q) := 0 . 5 ( 1 - ||$^%||i) (l-(5|5|)min^|Xj'fc|gfc, 

k=i 


K{q,a) and b{q,a) being defined in Equation (39) and (fO), respectively. 


(49) 


(50) 

(51) 
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Proof. First, we observe that {cj>j,(I — = ((/ — ei~) where it is convenient 

to define the j-th auxiliary atom at iteration t as = {I — P^^'^)4>j. According to (33), at 
iteration t (for 0 < t < s), a, sufficient condition for SOMP-NS to make a correct decision at 
iteration t is given by 


K 


max 

j'eN 


\k=l 


K 


qkj < 0.5(1 - ||$^%||i)(l 


Since a < b implies 6a < 6b, then 1 — 6\j^\ > 1 — (^| 5 | which shows that a less tight sufficient 
condition for SOMP-NS to make a correct decision at iteration t is 


K 


max 

j'eH 


E 


\k=l 


dt) 


K 


efc 


qk] < e(q) = 0.5(1 - ||$_^$; 5 ||i)(l - 5|5|)min^ (52) 

/ k=i 


Therefore, if the condition above holds for t = 0, SOMP-NS is guaranteed to pick a correct 
atom at iteration 0. At iteration 1, the orthogonal projector pd) can take |S| different values 
(each value of P^^^ corresponds to a specific support 5i C S). Therefore, if (52) holds for 
every possible projector matrix at iteration 1, SOMP-NS is guaranteed to pick a correct atom 
at iteration 1. Thus, we need to satisfy |5|n -|- 1 equations of the form Ylk=i \ {4>i ^k) \ Qk E s{q) 
(where \\cj )\\2 < 1) to ensure that SOMP-NS picks correct atoms during the first two iterations. 


Extending the previous train of thought from iteration 0 to iteration s, one easily comes 
to the conclusion that Cgn equations of the form Ylk=i \{4>^^k)\qk < ^(q) (where ||<^||2 < 1) 
should be satisfied since, at iteration t, there exist (^‘^^) possible realizations of the orthogonal 
projector P^^\ Using the union bound as in the proof of Theorem shows that the probability 
of satisfying the C^n equations is lower bounded by 


('ISU 

s \ t ) n 

i-EEE' 


t=0 i=l j=l LA:=1 


K 


X] ^k>£{q) 


(53) 


where p(*’®) is the Tth possible realization of the orthogonal projector at iteration t assuming 
that only correct atoms have been picked before iteration t. 


Lemmas 6.1 and 6.2 imply that 

K 


Lfc=i 


~ qk>e{q) 


< 


K 


^ \{ct)^,ek)\qk > e{q) 


Lfc=i 


(54) 


Furthermore, since e(qr) = £{q,cr) + b{q,a), Lemma 5.1 shows that 

K 


^ |(0j,efc)| qk > e{q) 


U=i 


< exp {-K{q,a)£{q,af) 


whenever e(q, cr) > 0. Finally, the probability that SOMP-NS chooses correct atoms from 
iteration 0 to iteration s (let us denote this event by Egucc) satisfies 




> 1 — nCs exp (—K(q, cr)e{q, <t)^) . 


□ 
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The theorem above translates several intuitive realities. First, by examining the expression 
of e, one concludes that several quantities influence the probability of correct recovery: 

• The expression quantifies the robustness of the recovery in the noiseless case. 

It is evident that the margin of error when deciding which atom to choose in the noiseless 
case will contribute to determine the admissible noise level in the noisy case. 

• The term (1 — (5|5|) translates the ability of the dictionary matrix $ to maintain the (.2 
norm of |S|-sparse signals, which ensures that the norm of the columns of remain 
sufficiently high to avoid being absorbed by the noise matrix 

• The minimized sum miujgs \Xj^k\<lk depicts the idea that the weighted sum of the 

coefficients associated with every atom should be high enough to allow SOMP-NS to iden¬ 
tify them when noise is added to the measurements. This term simultaneously captures 
the influence of the signal to be recovered as well as that of the weights. 

Moreover, «:(q, a) indicates that increasing the noise variances decreases the probability of 
support recovery. Also, increasing the weights naturally augments the power of the noise signal 
Nevertheless, it should be expected that this effect is counterbalanced by e{q) as the latter 
variable is also a function of the weights. Finally, n translates the fact that the noise affects 
every atom while Cg takes into account the existence of s iterations. 


It is worth noticing that C| 5 |_i = — 1 < so that it indicates that the probability of 

full support recovery is lower bounded by 

1 — n2l'^l exp (—K(q, cr)e{q, <t)^) . 


On the basis of past results in the literature, it will be suggested in Section [63] that Cg is an 
artifact of our proof and should ideally be replaced with a function that increases more slowly 
as I S'! is augmented. 


Also, as a last remark, Theoremj^can be rephrased by stating that the probability of failure 
of SOMP-NS from iteration 0 to iteration s, i.e., at least one wrong atom is chosen during the 
first s -|- 1 iterations, admits the upper bound 

nCgexp {-K{q,cr)£{q,cr)‘^) . (55) 

Theorem is now to be particularized using the coherence of $ instead of the RIP and the 
ERG. 

Theorem 7 (A coherence-based lower bound on the probability of full support recovery by 
SOMP-NS). Let (1 < k < K) be statistically independent Gaussian random vectors re¬ 
spectively distributed as Af{0,a‘^lmxm)- Let E = (ei,..., Cj^). Let also X G ^/jg 

signal matrix whose support is S. Let also (fji,..., cf)^ be unit-norm vectors in and $ = 
( 01 ,... ,4>n) corresponding matrix whose coherence p, is assumed to satisfy p < 2 | 5 pi- 

Let qi,... ,qK > 0. Then, SOMP-NS with dictionary matrix weights qk (I < k < K) and 
signal Y = E is ensured to make correct decisions during the first s -|- 1 iterations, i.e., 

from iteration 0 to iteration s included, with probability higher than 

I — exp K(q, cr)e(^)(q, <t)^^ (56) 
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whenever ct) := e^^\q) — b{q,cr) > 0, where 

c. ^ ± 


i=0 


K 


;(M)(q) ;= 0.5(1 - ^(2|S| - l))min V 


(57) 


K{q,a) and b{q,a) being defined in Equation (39) and (40), respectively. 

if /U < then ||$g$g||i < 


5.1 


Proof. As it has already been pointed out in Section 

i_([g[^i)^ < 1 for ^11 the supports of size IS"!. Moreover, Equation (12) shows that Sg < (s — l)/i 
whenever g, < l/((s — 1)) Thus, using the two inequalities above in conjunction with Theorem|^ 
yields 

efo)(g) = 0.5 (^1 - (1 - (1*51 - 

Elementary algebraic manipulations show that the expression above simplifies to 

K 


^fo)(q) = 0.5(1 - /r(2|5| - l))min |X,- 


j&s 


j,k\Qk- 


□ 


k=l 


A similar although stronger result can be obtained by means of the cumulative coherence 
function ^i. However, the resulting expression of e^^\q) is more complicated. Moreover, 
coherence-based bounds only have a theoretical interest as they prove to be pessimistic for 
practical cases, even when using the cumulative coherence function. 


6.5 Related results 

A result similar to Theorem has already been obtained in j22| . The striking similarities 
between our result and that obtained in [22] motivate this section. They prove the following 
theorem. 

Theorem 8. Theorem 7] Let Y = + E £ 'gjnxK jj a \S\ x K matrix of 

standard Gaussian random variables, S = diag(cr?)jgs' and E an error term orthogonal to the 
atoms in |5|. Assume that the dictionary matrix $ satisfies the restricted isometry property 
(RIP) with restricted isometry constant (RIC) (of order |5| -|- Ij (5|5|+i < 1/3 and 




< 


^J^K min aiil-36\s\+i). 


Then, the probability that [S’! iterations of SOMP fail to exactly recover the support S on the 
basis of Y does not exceed n2l'^l exp (—ATy^/vr) with K the number of measurement vectors, n 
the number of atoms and 


7 := 1 - 35|5|+i - 



-1 


— K min (jj 
TT ieS 


1^ ^-*-'||00‘ 


In |22|, the statistical analysis has been focused on the sparse signal to be estimated, i.e., 
X, while no particular statistical distribution is assumed for E. Since the noise vectors are 
assumed to be orthogonal to the columns of X, their purpose is to model an approximation 
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noise, i.e., the part of the signal Y that cannot be mapped by Conversely, the noise 

signals envisioned in our paper represent additive measurement noises that cannot be assumed 
to be orthogonal to any vector subspace. 

In the noiseless case, a result similar to Theorem]^ has been obtained in |19l Theorem 6.2] 
for a variant of SOMP entitled 2-SOMP, i.e., a SOMP algorithm where the decision on which 
atom to pick is performed on the basis of the maximization of a £2 norm instead of a ii norm. 

Although the problem addressed in this paper is different from that of Theorem the 
authors of [191122| have used approaches similar to ours and they also noticed that the parasitic 
term Cg seems difficult to remove. The rationale of this discussion is thus that it is difficult to 
avoid the suboptimal term Cg when conducting developments similar to that presented in this 
paper. 


6.6 Conjectures 

Theorem [^ provides interesting insights into how successful SOMP-NS is whenever some pa¬ 
rameters are modified. However, the theoretical developments presented in this paper do not 
properly capture all the characteristics of SOMP-NS. 


First of all, regarding the value of n should be lowered in practice. The reason why 

n should be replaced by n <C n is because our analysis assumes that all the atoms not to be 
picked (of index j) are such that = max^ &siT.k=i\{(t>z^xf)\qk) while, in 

practice, only a few atoms are likely to exhibit a significant value of Ylk=i \ 

We conjecture that C|s|-i may be replaced by a linear function of [S'!. The reason why we 
failed at obtaining such a result is probably linked to the proof of Theorem [^ All the pos¬ 
sible supports at each iteration are considered and it is ensured that the sufficient condition 
for making a correct decision is satisfied for each support. This is very pessimistic as only one 
support out of all the numerous possibilities actually matters. As indicated in Section [631 other 
researchers working on similar problems have stumbled upon this issue and no solution has been 
found so far to the best of the authors’ knowledge. The simulation results presented in the end 
of this paper will however not address this problem. 


Furthermore, as it will be explained in Section 7.5, the bias b{q,a) should be removed in 


order to deliver results compatible with what has been observed in our numerical simulations. 
This term is most likely an artifact linked to Equation (18) and Equation (19). Equation (18) 
basically assumes that and (<^j,efc) always have opposite signs for atoms cf)j whose in¬ 

dexes j belong to the support S. Conversely, Equation (19) assumes that {cf)j,Xk) and 
always have identical signs for atoms whose indexes do not belong to the support S. Thus, 
a statistical analysis directly performed on K^jjXk) + ((/>j,ek}lqk may prevent the bias 

term from appearing. 


Therefore, considering all the conjectures above, a better bound may be given by 

1 — nas exp (—K(q, cr)e(q, cr)^) (Bl) 

where n has been replaced with n n, as has been substituted to Cg and b{q, a) has been 
canceled out. 
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Finally, it is worth pointing out that Equation (Bl) will likely not deliver perfect results. 


Indeed, Equation (53) suggests that the probability of sparse support recovery success is actually 


a sum of exponential functions, each function corresponding to a single atom, iteration and 
support. Moreover, possibly different values of e(q,<T)^ should be chosen for each exponential 
function. It is also probably suboptimal to make use of the union bound. 

7 Numerical results 

This section aims at demonstrating that; 

1. SOMP-NS provides significant performance improvements when compared to SOMP pro¬ 
vided that the noise vectors exhibit different variances and that the weights are properly 
chosen. 

2. Depending on the signal model that is chosen, it is possible to accurately estimate the 


optimal weights by using simple closed-formed formulas derived from Equation (Bl) 


3. Equation (Bl) properly predicts the performance improvements obtained when the num¬ 


ber of measurement vectors increases. 

The purpose of the first point is to demonstrate numerically that the gains provided by SOMP- 
NS are significant. The last two points rather focus on the numerical validation of the theoretical 
analysis presented in this paper. With these goals in mind, a particular signal model will be 
chosen. In particular, this signal model will be sufficiently simple to allow the computation of 


the optimal weights when the model (Bl) is assumed to be correct. 


7.1 Particular signal model 

The objective of this section is to particularize Theorem]^ to models for which Xi^k = £i,klJ'X 
{1 < k < K, i € S) and Xi^k = 0 (1 < fe < iF, i 0 5) where the terms £i^k denote Rademacher 
random variables, i.e., random variables that return either 1 or —1 with probability 0.5 for both 
outcomes. 


Two different sign patterns will be distinguished. Sign pattern 1 refers to the case where 
the sign pattern is identical for all the sparse vectors Xk to be recovered, he., ei^k = £i,i for all 
k G [K], i G S and eqq _LL 6 * 2,1 whenever ii / Z 2 . Sign pattern 2 corresponds to the situation 
where the sign pattern is independent for each and within each Xk, i.e., TL 622,^2 whenever 

h / ^2 and/or ki / k 2 - 

In both cases, it is worth mentioning that the absolute values of the entries of each Xk 
are equal to nx- Thus, m.mj^sYlk=i\^j,kWk = IIqIIi/^x and, according to Theorem [o the 
probability of full support recovery is always higher than 


1 - nC|5|_iexp l-\\q\\ji4^K{q,(T) 


IIqIIi/^x 



(58) 


where 6 ' := 0.5 (l — ||$g$^||i) ( 1 —Also, the bound above only holds if 6 ' > b{q, a)/{\\q\\ifj,x)- 
By using the conjecture about the elimination of the bias term b{q, cr) described in Sec¬ 


tion 6 . 6 , one obtains Equation (59), which is reminiscent of Equation (IBII), except that (59) 


does not include the conjectures linked to the term nC\s\-i- If our only objective is to derive the 
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optimal weights according the theoretical model, only the argument of the exponential matters 
so that adjusting the term nC\s\-i has no effect. 


1 - nC\s\-i exp {-\\q\\l^^\K{q, crje'^) (59) 

Let {x) denote the arithmetic mean of the entries of vector x. Then, ||q||i = K‘^{q)‘^ and 


7('^)||2 = 


i = KiigM) Thus, the probability of full support recovery is always higher than 


1 — nC 


|S|-i exp 


.2^2) • ^ 


2((qk^i)kelK]} 

As a particular case, if = 1 (1 < fc < K), the latter probability also rewrites 


(B2) 


1 - nC| 5 |_i exp - 


J2 


2((o-^)fce[i^]) 


(60) 


Let us now focus our attention on the weights that maximize Equation (B2) or, equivalently, 
{(if/iiQM) k^[K])- We can restrict our attention to the maximization of 


(q) 


l^k=i ® 






We define qk = qkO'k so that the expression above also reads 




m2 


iQh 


•(^) 


ke[K]^ 


The quantity q/||g ||2 represents the direction of q so that we know that a global maximizer is 
obtained whenever q and {^/f^k)k£[K] have the same direction. It means that a global maximizer 
is obtained if and only if qk = C{l/ak) where C > 0. This is equivalent to requiring qk = 
C(1/(t|) since qk = qk^k- By choosing (7=1, one concludes that qk = l/cr| provides an optimal 


weighting strategy according to (B2). 


7.2 Simulation frameworks 


Now that the optimal weighting strategy for the signal models envisioned in Section 7.1 has 


been derived, it becomes possible to determine whether the bound (B2) properly predicts the 
impact of the weights on the performance that is achieved. 


Our MATLAB simulation software is available in m- All the scripts needed to generate 
the figures presented in this section are also available. The reader should know that every 
simulation result exposed hereafter has been performed by using single precision floating point 
representations. The reason for such a choice is that single precision arithmetic is faster and 
thus preferred for algorithms intended to run on real-time platform such as SOMP-NS. For the 
same reason, the simulation results are obtained faster when using the single precision format. 


It is assumed that m = 250 and n = 1000. The simulation framework consists of a fixed 
dictionary matrix $ G ]^250xiooo entries were generated on the basis of independent and 

identically distributed Gaussian random variables and then normalized in such a way that each 
column of the matrix exhibits a unit ^2 norm. This matrix is fixed for all the simulations and 
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is available in |12j . 

Two simulation frameworks have been envisioned to demonstrate the three points introduced 
at the very beginning of Section The first framework consists of simulations for the case K = 2 
and addresses the first two objectives while the last framework examines how the probability of 
successful support recovery evolves as K increases. 

7.3 Simulation results for K = 2 

First of all, it is worth pointing out that the performance achieved by SOMP-NS is invariant 
if the weight vector q := {qi,q 2 )"^ is multiplied by a positive constant. Therefore, only the 
angle 9q := arctan(g 2 / 9 i) is investigated. The weights are thus generated on the basis of the 
polar coordinate system qi = cos 9q, q 2 = sin0g where 0° < < 90°. In practice, the grid of 

weighting angles 9q will consist of 21 uniformly spaced angles from 5° to 85°. 

The noise standard deviation vector cr := (cri,(T 2 )'^ is generated on the basis of the polar 
coordinate system cr = (cos 0o-, sin where 9^ describes the orientation of the noise vector. 
The grid of values for 9a is composed of 21 uniformly spaced angles ranging from 20° to 70°. 
Extremely high or low angles have been avoided because they correspond to situations for which 
the noise concentrates essentially on one measurement vector. Therefore, appropriate weighting 
strategies would be able to cancel most of the noise and would lead to probabilities of correct 
support recovery that are too high to be reliably estimated on the basis of a limited number of 
Monte Carlo cases. 


A total of six simulation configurations have been run. Each configuration corresponds to 


one of the two sign patterns described in Section 7.1, to a support size |5| and to a value of 
^x- Simulation cases have been generated for each value of 9a belonging to the grid defined 
beforehand. Once the support size is fixed, the actual support is randomly and uniformly chosen 
among all the possibilities. The support that is simulated is independent for each case. 


Although ||cr||| = 1, the input signal-to-noise ratio (SNR), referred to as SNRin and defined 


in Equation (61), is to be modified by means of the quantity ^x- 


SNRin (dB) = 201ogio (61) 

Table describes all the configurations that have been investigated numerically. The val¬ 
ues of nx have been chosen in such a way that the probability of full support recovery when 
{9q,9a) = (45°, 45°) is approximately equal to 0.1 and 0.3 for the sign pattern 1 and sign pat¬ 
tern 2 respectively. These probabilities have been chosen in such a way that the probability of 
successful full support recovery over the grid defined for (9q,9a) never reaches values so high 
that it cannot be reliably estimated on the basis of a limited number of simulation experiments. 
The lower value of the probability of successful recovery for sign pattern 1 is linked to the fact 
that having a sign pattern independent for each measurement vector, as for sign pattern 2, 
provides performance improvements. This observation is actually reminiscent to what has been 
established in Theorem [8l 

In Table [!} the input SNR, i.e., SNRin, is estimated by generating 2 10^ cases for the 
configuration of interest, then the input SNR (in dB) is computed for each case and the results 
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are finally averaged. The Matlab script implementing this estimation is available in [12] . 

Table 1: Simulation configurations — K = 2 


Configuration ID 

Sign pattern 

|5| 

Aix 

SNRin 

^ cases 

Configuration 1 

1 

10 

2.28 

1.51 dB 

1.0 10^ 

Conhguration 2 

1 

30 

3.19 

5.37 dB 

1.0 10“^ 

Configuration 3 

1 

40 

6.94 

12.15 dB 

2.5 10^ 

Conhguration 4 

2 

10 

2.50 

1.76 dB 

1.0 10^ 

Conhguration 5 

2 

30 

3.06 

5.12 dB 

1.0 10^ 

Conhguration 6 

2 

40 

3.42 

6.76 dB 

1.0 10^ 


As an example, Figure [^depicts the probability of full support recovery for Configuration 2 
as a function of {9g,6a)- 


Probability of full support recovery 



20 40 60 80 


Oq (in degrees) 


Figure 2: Simulation results {K = 2) for Configuration 2 (sign pattern 1, |5| = 30, nx = 3.19 
and SNRin = 5.37 dB) - Probability of full support recovery as a function of 9q and 9a - The 
black curve refers to the optimal weights derived from Equation (B2) - Each pixel of the figure 
has been computed on the basis of 1.0 10^ simulation cases 


The first objective we would like to fulfill is demonstrating that SOMP-NS is capable to 
outperform SOMP whenever the noise standard deviations are not identical for each measure¬ 
ment vector. To do so, we will examinate, for each configuration and for each value of 9a, what 
is the ratio of the probability of failure obtained for 9q = 45°, i.e., the weights corresponding 
to SOMP, to the lowest probability of failure, i.e., that obtained for the truly optimal weights. 
Eigure]^ plots the aforementioned quantity for the all the configurations of Table Note that 
the optimal weights are determined on the basis of the numerical results, the formula 
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obtained on the basis of Equation (B2) is not used. 



Qa (in degrees) 



20 30 40 50 60 70 

Qa (in degrees) 


(a) Results for sign pattern 1 


(b) Results for sign pattern 2 


Figure 3: Simulation results {K = 2) - Ratio of the probability of failure of SOMP to that 
obtained for SOMP-NS when using the optimal weights 


It is observed that the gains provided by the proper application of SOMP-NS are significant, 
especially for low values of the support size. The only case for which the gain is almost nonex¬ 
istent is for sign pattern 1, = 2.50 and |5| = 40, i.e., Configuration 3. This may actually 

be a consequence of the normalization procedure of described previously which ensures that 
the probability of successful support recovery for {9q,9a) = (45°, 45°) is equal to 0.1. The value 
of /ix had to be chosen signihcantly higher than that of the other cases to attain the 0.1 goal, 
which limits the impact of the noise and thus hinders the improvement of the performance by 
modifying the weights. 


Let us now attack the second objective of Section We wish to show that the formula 
qk = l/<7^ obtained in Section 7.1 delivers reliable estimates of the optimal weights. Figure]^ 
summarizes the results. First of all, it is observed that, for the first sign pattern, the solution 
gfc = 1 /o'fc always corresponds to the truly optimal weights while this is not true for the second 
sign pattern. In particular, the discrepancy between the numerical results and the theoretical 
formula (B2) increases as the size of the support augments. 


The first observation is explained by the different sign patterns for each measurement vector. 
Although the weights have been introduced to better filter the influence of the noise, they also 
have an impact on the relative importance of each Xk in the decisions that are taken. Given 
that the sparse vectors Xk to be recovered have identical distributions, it is to be expected that, 
without noise, the optimal weights are obtained by choosing 9q = 45° for symmetry reasons. 
Figure [^displays the simulation results obtained for a configuration identical to Configuration 
6 except that SNRjn = 78.73 dB, i.e., the influence of the noise is negligible. It is observed that 
the interaction of the weights and the sparse vectors to be recovered exists and that the optimal 
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Optimal weighting angles 


Optimal weighting angles 




Figure 4: Simulation results {K = 2) 


optimal weights predicted by Equation (B2) 
derived from the simulation results 


Optimal weighting angles - The black curve denotes the 
The markers correspond to the optimal weights 


weighting angle is equal to 45°. A possible interpretation of the observations above is that the 
optimal weighting strategy is a mixture of the strategy that optimizes the support recovery in 
the noiseless case and of that which minimizes the impact of the noise on the decisions that are 
taken, i.e., qk = Nevertheless, further theoretical developments should be conducted to 

assess whether the proposed interpretation is correct. Finally, it is worth pointing out that the 
phenomenon described above is not observed for sign pattern 1 because xi and X 2 are identical 
in this case. 


The reasons that explain why the optimal weights get closer to 9q = 45° when the support 
size increases, as shown in Figure 4b is not clear and would require additional theoretical in¬ 
vestigations that fall outside of the framework of this paper. 


7.4 Simulation results for increasing values of K 


The final question of whether the proposed theoretical analysis properly conveys the properties 
of SOMP-NS whenever K increases is to be discussed in this section. Our main objective is 


to show that, as predicted by Equation (B2), the probability of failure of SOMP-NS decreases 


linearly with K when it is plotted is semi-logarithmic axes. Indeed, Equation (B2) yields 


logIPfaii < log (nC| 5 |_i) - K~ 




J2 


where Pfaii denotes the probability of failure of correct support recovery. 
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Probability of full support recovery 



6q (in degrees) 


Figure 5: Simulation results {K = 2) for Configuration 6 with SNRin = 78.73 dB - Probability 
of full support recovery as a function of 6q and 9^ - Sign pattern 2 - IS"! = 40 - fix = 15280 - 
# cases = 1.6 10^ 


The simulation framework consists of a fixed weighting strategy for which all the weights 
are equal, i.e., the weighting strategy corresponds to SOMP. For each value of K, the noise 
vector is given by cr = (1,1,..., l)"^ so that it is reminiscent of the noise vector defined 

in Section 7.3 for K = 2. The results are plotted in Figure The configurations that have 
been chosen are directly inspired of those presented in Table The number of cases simulated 
for each curve and each value of K is equal to 4 10^. Some configurations have been discarded 
because they do not exhibit a probability of failure equal to 0 without noise, which is incompat¬ 
ible with the implicit assumption of our theoretical model that no errors are committed in the 
noiseless case. Indeed, if 1 — < 0, i.e., the ERC is the noiseless case is not satisfied, 

then Equation (23) cannot hold. For example. Figure shows that Configuration 6 from Table 
exhibits a non-zero probability of failure without noise. 


The principal observation for Figure is that the slope of Pfaii in semi-logarithmic axes is 
linear with regards to K. This observation provides evidence that the theoretical model conveys 
the behavior of SOMP-NS when K increases. 


7.5 Summary of the numerical results 

The numerical results have revealed the following interesting facts: 


1. SOMP-NS provides significant performance improvements when compared to SOMP pro¬ 
vided that the weights are properly chosen and that the noise variances are different for 
each measurement vector. 


2. The formula qk 


1/cj^ derived from Equation (B2) corresponds to the truly optimal 
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K - Number of measurement vectors 

Figure 6: Probability of full support recovery failure as a function oi K - SP refers to sign 
pattern 


weights whenever 

• The sparse vectors to be recovered are identical. 

• The support size liSI is low enough. 

The exact reason why the formula gets less accurate as the size of the support 

increases remains an open question. 

3. The theoretical analysis properly predicts the characteristics of the decrease of the prob¬ 
ability of failure of SOMP-NS whenever the number of measurement vectors increases. 

Besides the three points above, which answer the three questions enumerated at the begin¬ 
ning of Section the close fitting of = 1 /it| and the numerical results for sign pattern 1 
suggests that the bias term b{q, a) is indeed an artifact of our developments as adding it would 
change the theoretically optimal weights and we would then observe a mismatch between them 
and those obtained by simulation. 


8 Future work 

As suggested in Section 


inequalities (18) and 


6.6, the bias b{q,a) could be removed by avoiding to make use of the 
Using a more subtle approach than the use of the union bound could 


also yield performance improvements. 


Although it appears to be difficult, replacing the term C| 5 |_i by a function that depends 
linearly on I^I would be of great interest, especially since it would close a hole in the literature 


34 































































regarding the performance of the well-known SOMP algorithm that is a particular case of ours. 


Finally, extending the presented analysis by performing a joint statistical analysis of both the 
noise and the sparse signals to be recovered would be of great interest. To begin with, it would 
provide a theoretical model that predicts the truly optimal weights by simultaneously taking 
into account how they impact the sparse signals to be recovered and the noise vectors. Next, it 
would also enable one to comprehend why the discrepancy between the formula qk = 1 and 
the truly optimal weights increases as the support size augments (see Figure]^. Finally, the 
statistical analysis of the sparse signals could replace nC| 5 |_i by nC\s\-i (where n is significantly 


lower than n) as conjectured in Section 6.6 


9 Conclusion 

A novel algorithm entitled SOMP-NS that generalizes SOMP by associating weights with each 
measurement vector has been proposed. A theoretical framework to analyze this algorithm 
has been built. Lower bounds on the probability of full support recovery by means of SOMP- 
NS have been developed in the case where the noise corrupting the measurements is Gaussian. 
Numerical simulations have revealed that the developed theoretical results accurately depict key 
components of the behavior of SOMP-NS while they also fail to capture some of its properties. 
In particular, it has been shown that, under the right circumstances, the weights of SOMP-NS 
can be efficiently optimized on the basis of the proposed theoretical bounds. Finally, the reasons 
that explain why some characteristics of SOMP-NS are not properly conveyed by the theoretical 
analysis have been discussed and potential workarounds to be investigated have been suggested. 
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